Document Categorizer Agent for Computer Science Academic Papers
نویسندگان
چکیده
This paper presents Document Categorizer Agent that categorizes computer science academic papers in .pdf format such as journals and proceedings. In this paper, we propose the use of set of term stored in a database to categorize computer science papers. Few methods and algorithms from related work are considered in improving the categorization process. We have evaluated our document categorizer agent on a number of computer science papers. The categorization process is done by parsing the document, calculating the frequency of each term and matching the terms found with the dataset found in the database. We have shown that the use of this term database can be used to categorize documents. The categorizer agent focuses on categorizing the text document into predetermined category based on the extracted keyword. This can help in making the searching process more efficient and saves the user’s time in searching for the desired document.
منابع مشابه
Evaluation of the Document Categorization in "Fixed-point Observatory"
“Fixed-point observatory” is a prototype to support users to grasp recent trends in the fields of their interest from large-scale information. It consists of content-based categorizer, named-entity-based categorizer and multiple-document summarizer. We have evaluated the content-based categorizer, which adopts the simple “bag-of-words” model. Though the quality seems be sufficient for rough cla...
متن کاملAlgorithmic Detection of Computer Generated Text
Computer generated academic papers have been used to expose a lack of thorough human review at several computer science conferences. We assess the problem of classifying such documents. After identifying and evaluating several quantifiable features of academic papers, we apply methods from machine learning to build a binary classifier. In tests with two hundred papers, the resulting classifier ...
متن کاملExperiments with HITEC: a Hierarchical Text Categorizer
This paper presents experiments on the effectiveness of HITEC software (HIerarchical TExt Categorizer) on several natural languages (English, German) and with various kinds of text corpora. HITEC applies UFEX (Universal Feature EXtractor) method for hierarchical text categorization. Based on the obtained results shows that HITEC outperforms its known competitors on the investigated corpora, its...
متن کاملPreparation of Papers for the IAENG International Journal of Computer Science
These instructions give you guidelines for preparing papers for the journal IAENG International Journal of Computer Science. Use this document as a template if you are using LaTeX. Motion tracking and object recognition often use cameras that are mounted in motion platforms like pantilt units, linear tables and even robots. Tracking can be automated by visually servoing the platform’s degrees-o...
متن کاملNetworks of reader and country status: an analysis of Mendeley reader statistics
The number of papers published in journals indexed by the Web of Science core collection is steadily increasing. In recent years, nearly two million new papers were published each year; somewhat more than one million papers when primary research papers are considered only (articles and reviews are the document types where primary research is usually reported or reviewed). However, who reads the...
متن کامل